Expressive speech synthesis: synthesising ambiguity
نویسندگان
چکیده
Previous work in HCI has shown that ambiguity, normally avoided in interaction design, can contribute to a user’s engagement by increasing interest and uncertainty. In this work, we create and evaluate synthetic utterances where there is a conflict between text content, and the emotion in the voice. We show that: 1) text content measurably alters the negative/positive perception of a spoken utterance, 2) changes in voice quality also produce this effect, 3) when the voice quality and text content are conflicting the result is a synthesised ambiguous utterance. Results were analysed using an evaluation/activation space. Whereas the effect of text content was restricted to the negative/positive dimension (valence), voice quality also had a significant effect on how active or passive the utterance was perceived (activation).
منابع مشابه
Towards synthesising expressive speech; designing and collecting expressive speech data
Corpus-based speech synthesis needs representative corpora of human speech if it is to meet the needs of everyday spoken interaction. This paper describes methods for recording such corpora, and details some difficulties (with their solutions) found in the use of spontaneous speech data for synthesis.
متن کاملSynthesising and Evaluating Cross-Modal Emotional Ambiguity in Virtual Agents
Emotional ambiguity, when more than one emotion appears present at a given time, or several emotions are superimposed, is common in human interaction and effects such as irony can be intentionally created through a mismatch of such emotional signals. High quality emotional speech synthesis offers a means for testing the effect of combining differences in vocal emotion, facial expression and tex...
متن کاملEnabling controllability for continuous expression space
A continuous expression space assumes that each utterance contains individual expressions. Thus, it can be used to model detailed expression information in speech data. However, since an infinite number of different expressions can be contained in the continuous expression space, it is very difficult to manually label them. That means, these expressions are very hard to identify and to extract ...
متن کاملAlert!... Calm Down, There is Nothing to Worry About. Warning and Soothing Speech Synthesis
Presence of appropriate acoustic cues of affective features in the synthesized speech can be a prerequisite for the proper evaluation of the semantic content by the message recipient. In the recent work the authors have focused on the research of expressive speech synthesis capable of generating naturally sounding synthetic speech at various levels of arousal. The synthesizer should be able to ...
متن کاملSynthesising Uncertainty: The Interplay of Vocal Effort and Hesitation Disfluencies
As synthetic voices become more flexible, and conversational systems gain more potential to adapt to the environmental and social situation, the question needs to be examined, how different modifications to the synthetic speech interact with each other and how their specific combinations influence perception. This work investigates how the vocal effort of the synthetic speech together with adde...
متن کامل